Early sample tagging and pooling enables simultaneous SARS-CoV-2 detection and variant sequencing

Description

barcode c002. Assume 1 in 100k reads hops between libraries, the probability of more than one read with the same UMI hopping between libraries is extremely small. Filtering UMIs with more than one supporting read makes barcode hopping negligible. Alternatively, if this filter is too restrictive, a slightly more sophisticated approach, whereby a probabilistic model assigns UMIs to their most likely sample of origin is straight-forward, albeit more computationally intensive.

Sequencing requirements
The sequencing requirement for viral detection using ApharSeq is reading a single pool barcode (8-12 bp) and at least 50 bases from R1, or 20 bp from R1 (which covers the sample barcode and UMI) and 30 bp from R2. When we down-sampled the data in the titration experiment, we found that a change of x1000 in sequencing depth only incurs a 2-fold reduction in sensitivity ( Fig 4D).
Additionally, using the Ct distribution observed in the clinic, we performed a simulation to estimate the false positive and false negative rate as a function of sequencing depth (Fig. S5). In these simulations, when sequencing depth ranged from 10,000 to 100,000 reads per sample, the false negative rate ranged from 4.5% to 0.2% respectively. We conclude that 50,000 reads per sample on average should suffice. A parallel analysis was done that down sampled the clinical test data and showed that 25,000 reads per sample are sufficient ( Fig 6C). This means that a single Illumina NextSeq 500/550 run with 400 million reads suffices for processing 8,000-16,000 samples, and a NovaSeq S2 100bp run with 8 billion reads allows the processing of 160,000-320,000 samples simultaneously.

RT primer performance variation
After experiencing some issues with a batch of barcoded primers we decided to test the primers for variation and contaminations. We pooled 96 primers into a single pool twice, and hybridized them, as a pool, to a single positive and single negative sample. If any of the primers is contaminated, a library should arise from the negative sample and then a search for the contamination should ensue. Similarly, any variation in the number of UMIs observed indicates a barcode-specific issue, probably related to synthesis or a step prior to the RT reaction. Another test for primer variation can examine differences in the distribution of reads/UMI, suggesting PCR amplification biases, but since UMI-counting should mitigate such differences, PCR biases are a lesser concern.
When we perform this test ( Figure S4) , we see most variation in UMI counts is primer-specific. We believe that this is mostly due to synthesis differences. More than >80% of primers fall within a ±25% range around the mean, indicating that while this issue should be addressed, it is not detrimental to the implementation of ApharSeq on a large scale.

Contamination issues and best practices
While working on the development of the assay, we encountered significant PCR contamination issues.
Specifically, previous sequencing-ready libraries are a potential contaminant and are re-amplified in subsequent PCR reactions. Once we realized this was an issue, we could detect these contaminants experimentally by including negative controls in the library PCR reactions, and computationally by the checking for the existence of reads with RT primers that were not used in the specific experiment, and by comparing the UMI pools of previous and current libraries in cases where they shared the same RT primers.
For the time being this issue seems to be under control in our lab. To achieve this, we separated the work space to pre-PCR and post-PCR. No reagent or device is used in both spaces, and there are dedicated lab coats to each space. Any processing performed on the final library -quantification by qbit, tape station, etc.
-is done only in the post-PCR space. See Aslanzadeh for more information (50).
Additionally, we observed contamination between primers in the synthesized oligo plate we received from the manufacturer. We highly recommend testing each plate as it arrives for primer contamination and uniformity (see Figure S4, S6 above). This issue seemed to have been resolved when we ordered the oligos in dry form and dissolved them in the lab.

Required equipment and plastic consumables
Preparation and consumables per plate: -25 µl 10 µM barcoded primers are distributed with in 25 µl binding buffer to a 96 PCR plate and stored in -20° -500 µl wash A is distributed to a 2 ml deep-well plate -Samples are ordered in a 2 ml deep-well plate by a dedicated robotic system -A single 96-well 150 µl filter tipbox per plate is used (Tecan)

RNA cleanup and hybridization
Performed by the MCA on the robotic system using a single tip box in ~40 minutes and ends with a single tube containing the pool of the 96 samples in a single well of the plate in RNAlater for long-term storage.
Our robotic setup includes an Evo robotics system by Tecan with: In terms of infrastructure costs, the main device uniquely required for ApharSeq is a sequencing machine.
A note on RNA capture methods Since we aimed for an RNA-seq based detection assay, we first needed to extract RNA from the lysed clinical samples with standard, high-throughput nucleic acid cleanup techniques. We tried three different approaches: polyT paramagnetic microbeads (commercial, and home-made variants), paramagnetic microbeads conjugated with viral-specific bait oligos, and SPRI beads for general nucleic acid cleanup.
The viral-specific beads yielded poor results, and we discontinued this experimental approach. The polyT/SPRI branches gave high viral RNA yields, comparable to common RNA extraction procedures ( Figure 1A), and were compatible with downstream standard RT-qPCR kits. We performed extensive tests on SPRI extraction ( Figure S1) and performed several library preparation tests based on the SPRI cleanup ( Figure S1). These tests demonstrate that it is a viable alternative to the homemade polyT beads we are currently using, and that with further optimizations might even provide higher sensitivity ( Figure 1A). Both techniques are based on the Sera-Mag SpeedBeads modified with a carboxylate residue by GE healthcare (Cat# 65152105050250) Both variants allow large scale batch preparation and long-term storage and usage.
See Rahat et al. (21) for more details on the SPRI protocol, and the "Bead conjugation" and "Hybridization and RNA purification" sections in the methods part of this manuscript. We sample the number of molecules per sample and assume that Ct 26 is ~11,000 molecules per assay (~35,000/ml (31)). A UMI is sampled for each molecule (assuming uniform synthesis), and sequencing errors are introduced. The molecules are pooled, and the total number of reads assigned to the pool (e.g. 96,000 reads in the case of 1,000 reads per sample) is sampled from that pool, with replacement (accounting for PCR amplification). UMIs are collapsed, and unique molecules are counted. The detection threshold was set to arrive at a false positive rate of 1 in 1000 (assuming the background is correct), and the false negative rate is empirically determined by the number of samples that are positive but have less observed molecules than the detection threshold.

ApharSeq instantiation
The following is a specific instance of the protocol:

D) SPRI-based extraction and Apharseq on the viral E amplicon.
Purple/teal indicate positive samples when they were pooled/unpooled, respectively, green/red are matched negative samples that were pooled/unpooled respectively. Pooled negative samples have less unique molecules relative to their positive counterparts, indicating that cross contamination occurs to a minimal degree that can likely be optimized further.

Supplementary Figure 2 -Preliminary Optimizations
All tests were performed on the E amplicon using a primer-specific qPCR reaction (as shown in Figure 3A).
B) RNA melting temperature. Sample preheating is crucial (relative to 25ºC), but any temperature above 55ºC appears to yield a similar amount of product.
C) RT reaction. Reaction conditions were tested relative to the "base" manufacturer's conditions. Signal vs. background (dark red vs. dark blue) yield improved significantly in the MgCl 2 + DMSO condition. Fig S3. Human Control Amplicon.

Supplementary Figure 3: Human Control Amplicon
A) All the targets tested on a pool of negative samples (extension of Figure 5C). "a"/"b" indicate different primer pairs on the same transcript B) Addition of human RNA. Same primers used on the pool of negative samples supplemented with 1 µg of RNA extracted from HEK cells.

C) Attenuating relative amplicon abundance.
The target-specific PCR primer concentration was varied in the library PCR to alter amplicon abundance. Assay is a qPCR test that is target+library specific. There is a dose-response relationship between primer abundance and product abundance (height of blue series, green series). Note that different amplicons appear to alter the E amplicon yield (height of orange bars)

Supplementary Figure 4: RT barcode variations
We pooled 96 differently-barcoded RT primers twice ("A"/"B") and hybridized these pools in two replicates to a positive sample ("1"/"2"). Plotted are the number of UMIs observed in each pool/replicate.
Each dot is a specific primer. Primers are highly correlated between pools, suggesting that most variation is due to intrinsic properties of the primer or synthesis variation issues. In all replicates, >80% of barcodes are within a ±25% range around the mean.    Table S3. DNA Oligos used in this study.

B) Undetected ActB in samples (i.e. dropout samples) in both techniques is similar (5/8).
Provided as an external excel spreadsheet.